g

y

g

g

[

,

] and to investigate how gastrectomy impacts on gastric cancer

based on profiling faecal microbiome and metabolome

ntari, et al., 2020].

e definition and working principle of LDA

the nth observation is represented by a vector ܠ and the label of

oted by ݕ. In terms of protease cleavage pattern discovery, ܠ

de, which is labelled by ݕ as either cleaved or non-cleaved. The

tion label for this type of data is binary. Normally a non-cleaved

is labelled by a zero, i.e., ݕൌ0 and a cleaved peptide ܠ is

by a one, i.e., ݕൌ1. A general format of a classification model

below,

ݕොൌ݂ሺܠ, ܟሻ

(3.1)

is a vector of model parameters, ݂ is a classification function, ݕො

iction corresponding to ݕ. In a well-constructed classifier, ݕො

e a numerical value close to zero if ݕൌ0 and ݕො should be a

l value close to one if ݕൌ1. If a classification problem is linear,

cation model can be formulated as below,

ൌܟܠൌݔ௡ଵݓ൅ݔ௡ଶݓ൅⋯ݔ௡ௗݓ↦ݕ

(3.2)

corresponds to the ith independent variable of vector ܠ and ݓ

r the ith weight in ܟ, which is used to weigh the contribution of

A vector-matrix format of a linear classifier is formulated as

here X is an input matrix and ܡො is an output vector

ܡොൌ܆ܟ

(3.3)

major part of LDA is to find the best projection direction to map a

ensional genotype space (X) to a one-dimensional phenotype

. To make a LDA model work, the density of ܡො is required to be

Only when this bimodality is maximised, should the projection

or the model parameters w be considered as an optimal solution